optimizing binary size
2022-02-20 ยท 6 min read
Setup #
# for running cargo bloat
$ RUSTFLAGS="-C target-cpu=native" c install cargo-bloat
# for running cargo size (among other things)
$ rustup component add llvm-tools-preview
$ rustup +nightly component add llvm-tools-preview # (for strip=symbols)
$ RUSTFLAGS="-C target-cpu=native" c install cargo-binutils
TL;DR #
# compile and sort functions by binary size
$ cargo bloat --release -n 100
# only like 20-25% of the binary size seems to be our code or other relevant
# stuff like ndarray. The rest seems to be mostly panic and fmt
# infrastructure...
# compile and sort crates by binary size
$ cargo bloat --release --crates
# print the size of each section in the binary
$ cargo size --bin my-bin --release -- -A
# if you already have a built rust binary, you can run
# rust-size directly:
$ rust-size -A target/release/my-bin
# rust-size with nice sorted and human-readable output:
$ rust-size -A target/release/my-bin \
| tail -n +2 \
| sort --numeric-sort --key=2 \
| numfmt --header=3 --to=iec-i --suffix=B --field=2
# strip debug info and all symbols (requires nightly) then print section size
$ RUSTFLAGS="-Z strip=symbols -C target-cpu=native" cargo +nightly \
size --bin my-bin --release -- -A
# before: 1.8 MiB! looking at the sections, it's mostly debug info.
# "-Z strip=symbols" brings this down to like 330-400 KiB (depending on
# other flags etc...)
# TODO: cargo-binutils also installed `cargo strip`; maybe that's helpful?
Cargo.toml #
[profile.release]
codegen-units = 1
lto = true
panic = "abort"
# opt-level = "s" # optimize for size, but still unroll
# opt-level = "z" # optimize for size, no unrolling at all
opt-level = 3
debug = 0
Compile std
with panic = "abort"
#
- shaves off maybe 150 KiB?
- removes a decent chunk of the backtrace/unwind infrastructure
# .cargo/config.toml
[unstable]
build-std = ["std", "panic_abort"]
build-std-features = [] # <- turns off backtrace+unwind features
WASM #
https://rustwasm.github.io/twiggy/
Example: fixing bloat #
Let me just run a quick smoketest (which depends on almost every crate in the monorepo)...
$ cargo test -p smoketest
# ..
Finished test [unoptimized + debuginfo] target(s) in 1m 28s
Running unittests src/lib.rs (target/debug/deps/smoketest-bd637d7668a0b714)
# ..
Man that sure took a while to link, I wonder how big the binary is?
$ ls -lah target/debug/deps/smoketest-bd637d7668a0b714
-rwxrwxr-x 1 phlip9 phlip9 880M May 4 11:19 target/debug/deps/smoketest-bd637d7668a0b714
JESUS. RIP MY SSD.
$ rustup component add llvm-tools-preview
$ rust-size -A target/debug/deps/smoketest-bd637d7668a0b714 \
| tail -n +2 \
| sort --numeric-sort --key=2 \
| numfmt --header=3 --to=iec-i --suffix=B --field=2
section size addr
.fini_array 8 63669912
.fini 13 52345408
.init_array 16 63669896
.plt.got 24B 3608704
.init 27B 3608576
.interp 28B 848
.note.ABI-tag 32B 948
.note.gnu.property 32B 880
.debug_gdb_scripts 34B 55423896
.note.gnu.build-id 36B 912
.comment 43B 0
.gnu.hash 48B 984
.tdata 72B 63669824
.plt 96B 3608608
.rela.plt 120B 3605536
.gnu.version 318B 7088
.gnu.version_r 432B 7408
.dynamic 544B 64809064
.tbss 696B 63669896
.bss 2.1KiB 65535456
.dynstr 2.2KiB 4848
.dynsym 3.8KiB 1032
.debug_macro 12KiB 0
.data 24KiB 65511424
.got 686KiB 64809608
.data.rel.ro 1.1MiB 63669920
.gcc_except_table 1.4MiB 62213696
.eh_frame_hdr 1.5MiB 55423932
.debug_abbrev 2.3MiB 0
.rodata 3.0MiB 52346880
.rela.dyn 3.5MiB 7840
.debug_loc 3.5MiB 0
.debug_aranges 4.9MiB 0
.eh_frame 5.1MiB 56953128
.debug_ranges 15MiB 0
.debug_line 28MiB 0
.text 47MiB 3608768
.debug_pubnames 112MiB 0
.debug_str 175MiB 0
.debug_info 175MiB 0
.debug_pubtypes 275MiB 0
Total 851MiB
WTF IS GOING ON WITH THE .debug_pubtypes
SECTION???
Ok ok, let's take a look at what we're working with...
$ sudo apt install dwarfdump
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| head -n 10
.debug_pubtypes
'ErrorData<alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>>'
'alloc::boxed::Box<std::io::error::Custom, alloc::alloc::Global>'
'alloc::boxed::Box<(dyn core::error::Error + core::marker::Send + core::marker::Sync), alloc::alloc::Global>'
'Result<(), std::io::error::Error>'
'NonNull<u8>'
'u8'
'SimpleMessage'
'ErrorKind'
How many types we got?
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| wc -l
| numfmt --to=si
2.3M
Maybe there's some giga types?
$ dwarfdump --print-type --format-suppress-offsets target/debug/deps/smoketest-bd637d7668a0b714 \
| awk '{ print length, $0 }' \
| sort -n -r \
> smoketest_debug_pubtypes
$ head -n 10 smoketest_debug_pubtypes
72011 '{closure_env#0}<&str, &str, n ..
71962 '&mut (nom::sequence::terminat ..
71957 '(nom::sequence::terminated::{ ..
71957 '(nom::sequence::terminated::{ ..
66740 '&mut nom::branch::alt::{closu ..
66717 '{closure_env#0}<&str, &str, n ..
66717 '{closure_env#0}<&str, &str, n ..
66668 '&mut (nom::sequence::terminat ..
66663 '(nom::sequence::terminated::{ ..
66663 '(nom::sequence::terminated::{ ..
Ok despite nom
taking to top 10, it looks like the primary culprit is my arch nemesis warp
. CURSE YOU WARP AND YOUR COMPOSABLE GENERICS.
Let's see what proportion of our .debug_pubtypes
is warp...
$ cat smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
271MiB
$ grep "nom" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
3.3MiB
$ grep "warp" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
82MiB
$ grep "lightning" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
45MiB
$ grep "proptest" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
5MiB
$ grep "Vec" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
71MiB
$ grep "hyper" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
95MiB
$ grep "tokio" smoketest_debug_pubtypes \
| cut -d " " -f1 - \
| awk '{sum += $1} END {print sum}' \
| numfmt --to=iec-i --suffix=B
95MiB
A bunch of duplicates...
Checking out the .debug_str
section:
$ dwarfdump --print-strings --format-suppress-offsets target/debug/deps/smoketest-bafe2762ec60a400 | sort --numeric-sort --key=3 | cut -b -80 | tail -n 10
name: length 67206 is 'pin<futures_util::future::try_future::into_future::IntoFu
name: length 67220 is 'get_unchecked_mut<futures_util::future::try_future::into_
name: length 67222 is 'leak<alloc::sync::ArcInner<warp::filter::boxed::BoxingFil
name: length 67229 is 'from<futures_util::future::try_future::into_future::IntoF
name: length 67233 is 'into_pin<futures_util::future::try_future::into_future::I
name: length 67240 is 'new<alloc::boxed::Box<alloc::sync::ArcInner<warp::filter:
name: length 67257 is 'new_unchecked<alloc::boxed::Box<futures_util::future::try
name: length 71736 is 'choice<&str, &str, nom::error::Error<&str>, nom::sequence
name: length 134431 is 'into<&mut alloc::sync::ArcInner<warp::filter::boxed::Box
name: length 134508 is 'into<alloc::boxed::Box<futures_util::future::try_future: